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1 Information extraction and text segmentation: AuGEAS: authoritativeness grading, 
estimation, and so rting 
Ayman Farahat, Geoff Nunberg, Francine Chen 

November 2002 Proceedings of the eleventh international conference on Information 
and knowledge management 

Full text available: ^ pdf(1 35.62 KB) Additional Information: full citation , abstract , references , citings 

When searching for content in in a large heterogeneous document collections like the World 
Wide Web it is not easy to know which documents provide reliable authoritative information 
about a subject. The problem is particularly pointed as it concerns conteint search for "high- 
value" informational needs such as retrieving medical information, where the cost of error 
may be high. In this paper, a method is described for estimating the authoritativeness of a 
document based on textual, non-topi ... 



Link Analysis: Web page scoring systems for horizontal and vertical search 
Michelangelo Diligenti, Marco Gori, Marco Maggini 

May 2002 Proceedings of the eleventh international conference on World Wide Web 

Full text available: ^ pdf(243.92 KB) Additional Information: full citation , abstract , references , index terms 

Page ranking is a fundamental step towards the construction of effective search engines for 
both generic (horizontal) and focused (vertical) search. Ranking schemes for horizontal 
search like the PageRank algorithm used by Google operate on the topology of the graph, 
regardless of the page content. On the other hand, the recent development of vertical 
portals (vortals) makes it useful to adopt scoring systems focussed on the topic and taking 
the page content into account.In ... 



Keywords: Focused PageRank, HITS, PageRank, random walks, web page scoring systems 



3 Link anal ysis: Link fusion: a unified link analysis framework for multi-t y pe interrelated Q 
data ob j ects 

Wensi Xi, Benyu Zhang, Zheng Chen, Yizhou Lu, Shuicheng Yan, Wei-Ying Ma, Edward Allan 
Fox 

May 2004 Proceedings of the 13th international conference on World Wide Web 

Full text available: ^ pdf(510.05 KB) Additional Information: full citation , abstract , references , index terms 

Web link analysis has proven to be a significant enhancement for quality based web search. 
Most existing links can be classified into two categories: intra-type links (e.g., web 
hyperlinks), which represent the relationship of data objects within a homogeneous data 
type (web pages), and inter-type links (e.g., user browsing log) which represent the 
relationship of data objects across different data types (users and web pages). 
Unfortunately, most link analysis research only considers one type of ... 
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4 Information retrieval session 7: web: Representing interests as a h y perlinked document Q 
collection 

Michelle Fisher, Richard Everson 

November 2003 Proceedings of the twelfth international conference on Information and 
knowledge management 

Full text available: ^ pdf(1 11.85 KB) Additional Information: full citation , abstract , references , index terms 

We describe a latent variable model for representing a user's interests as a hyperlinked 
document collection. By collecting hyper-text documents that a user views, creates or 
updates whilst at their computer, we are able to use not only the content of these 
documents but also the inter-connectivity of the collection to model the user's interests. The 
model uses Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced Topic 
Selection and decomposes the user's document collection ... 

Keywords: hyperlinked/hypertext document collections, information access, latent variable 
models, user interests 



Local versus global link information in the Web 

Pavel Calado, Berthier Ribeiro-Neto, Nivio Ziviani, Edleno Moura, Ilmerio Silva 
January 2003 ACM Transactions on Information Systems (TOIS), volume 21 issue 1 

Full text available: fg|pdf(413 06 KB) Additiona l Information: full citation , abstract , references , citings, index 
' : terms 

Information derived from the cross-references among the documents in a hyperlinked 
environment, usually referred to as link information, is considered important since it can be 
used to effectively improve document retrieval. Depending on the retrieval strategy, link 
information can be local or global. Local link information is derived from the set of 
documents returned as answers to the current user query. Global link information is derived 
from all the documents in the collection. In th ... 

Keywords: Belief networks, World Wide Web, link analysis, local and global information 



6 Web Information Retrieval: Th e Im portance of Prior Probabilities for Entry Page Search Q 
Wessel Kraaij, Thijs Westerveld, Djoerd Hiemstra 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: W pdf(1 35.87 KB) Additional Information: full dtetjon, abstract, references , citings, index 

terms 

An important class of searches on the world-wide-web has the goal to find an entry page 
(homepage) of an organisation. Entry page search is quite different from Ad Hoc search. 
Indeed a plain Ad Hoc system performs disappointingly. We explored three non-content 
features of web pages: page length, number of incoming links and URL form. Especially the 
URL form proved to be a good predictor. Using URL form priors we found over 70% of all 
entry pages at rank 1, and up to 89% in the top 10. Non-conten ... 

Keywords: URLs, entry page search, language models, links, parameter estimation, prior 
probabilities 



7 Stable algorithms for link analysis 

Andrew Y. Ng, Alice X. Zheng, Michael I. Jordan 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference on 
Research and development in information retrieval 
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terms 

The Kleinberg HITS and the Google PageRank algorithms are eigenvector methods for 
identifying * "authoritative" or 1 'influential" articles, given hyperlink or citation information. 
That such algorithms should give reliable or consistent answers is surely a desideratum, and 
in~\cite{ijcaiPaper}, we analyzed when they can be expected to give stable rankings under 
small perturbations to the linkage patterns. In this paper, we extend the analysis and show 
how it gives insight into ways of de ... 



8 Research track p a pers: Web usage mining based on probabilistic latent semantic 
analysis 

Xin Jin, Yanzan Zhou, Bamshad Mobasher 

August 2004 Proceedings of the 2004 ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available:^ pdf(747.27 KB) Additional Information: full citation , abstract , references , index terms 

The primary goal of Web usage mining is the discovery of patterns in the navigational 
behavior of Web users. Standard approaches, such as clustering of user sessions and 
discovering association rules or frequent navigational paths, do not generally provide the 
ability to automatically characterize or quantify the unobservable factors that lead to 
common navigational patterns. It is, therefore, necessary to develop techniques that can 
automatically discover hidden semantic relationships among use ... 

Keywords: PLSA, Web usage mining, user profiling 



9 Formal specification for a clinical cyclotron control system 
Jonathan Jacky 

April 1990 ACM SIGSOFT Software Engineering Notes , Conference proceedings on 

Formal methods in software development volume 15 issue 4 
Full text available: Q pdf(1.15 MB) Additional Information: full citation , references , index terms 



1 0 Structure of mathematical programmin g systems 
WM. Orchard Hays 

January 1968 Proceedings of the 1968 23rd ACM national conference 

Full text available: ^pdf(1.47 MB ) Additional Information: full citation , abstract , index terms 

A mathematical programming system (MPS), as now implemented on third generation 
computers, constitutes four separate subject areas: 1. Algorithmic and procedural 
capabilities 2. Problem formulation and solution techniques 3. Programming languages 4. 
System structure and use Each of these areas involves extensive considerations and we can 
not do justice to any of them in the time available. Since problem formulation and solution 
techniqu ... 
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11 Link-based ranking 2: Searching the workplace web Q 
Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin, 
David P. Williamson 

May 2003 Proceedings of the twelfth international conference on World Wide Web 

Full text available* -fill Ddff231 55 KB) Additional Information: full citation , abstract , references, citin gs, index 
*^ ! terms 

The social impact from the World Wide Web cannot be underestimated, but technologies 
used to build the Web are also revolutionizing the sharing of business and government 
information within intranets. In many ways the lessons learned from the Internet carry over 
directly to intranets, but others do not apply. In particular, the social forces that guide the 
development of intranets are quite different, and the determination of a "good answer" for 
intranet search is quite different than on the Int ... 
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12 A survey of Web metrics 

Devanshu Dhyani, Wee Keong Ng, Sourav S. Bhowmick 
December 2002 ACM Computing Surveys (CSUR), volume 34 issue 4 

Full text available: ^ pdf(289.28 KB) Additional Information: full citation , abstract , references , index terms 

The unabated growth and increasing significance of the World Wide Web has resulted in a 
flurry of research activity to improve its capacity for serving information more effectively. 
But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" of 
Web resources and services. This observation points towards measurements and models 
that quantify various attributes of web sites. The science of measuring all aspects of 
information, especially its storage and retrieval or ... 

Keywords: Information theoretic, PageRank, Web graph, Web metrics, Web page similarity, 
quality metrics 
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1 3 Document Databases: Requirements for XML document database systems | 
Airi Salminen, Frank Wm. Tompa 

November 2001 Proceedings of the 2001 ACM Symposium on Document engineering 

Full text available* fiC]pdf(141 89 KB). Additional Information: full citation , abstract , references , citings, index 
. i£) : terms 

The shift from SGML to XML has created new demands for managing structured documents. 
Many XML documents will be transient representations for the purpose of data exchange 
between different types of applications, but there will also be a need for effective means to 
manage persistent XML data as a database. In this paper we explore requirements for an 
XML database management system. The purpose of the paper is not to suggest a single type 
of system covering all necessary features. Instead the pur ... 

Keywords: XML, XML database systems, data definition, data manipulation, data modelling, 
structured documents 



14 Finding authorities and hubs from link structures on the World Wide Web 
Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosenthal, Panayiotis Tsaparas 
April 2001 Proceedings of the tenth international conference on World Wide Web 

Full text available: ^ pdf(269.01 KB) Additional Information: full citation , references , citings , index terms 



Keywords: Bayesian, Kleinberg's algorithm, SALSA, authorities, hubs, link analysis, 
threshold, web searching 



15 Does "authority" mean qualit y? predictin g expert quality ratings of Web documents 
Brian Amento, Loren Terveen, Will Hill 

July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available' fi 3 pdf(773 39 KB) Additional Information: full citation , abstract , references , citings , index 
^ ' terms 

For many topics, the World Wide Web contains hundreds or thousands of relevant 
documents of widely varying quality. Users face a daunting challenge in identifying a small 
subset of documents worthy of their attention. 

Link analysis algorithms have received much interest recently, in large part for their 
potential to identify high quality items. We report here on an experimental evaluation of this 
potential. 

We evaluated a number of link and content-based algorithms using a dat ... 
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16 Global surveillance: the evidence for Echelon 
Duncan Campbell 

April 2000 Proceedings of the tenth conference on Computers, freedom and privacy: 
challenging the assumptions 

Full text available: ^ pdf(1 12.90 KB) Additional Information: full citation , index terms 



17 Authoritative sources in a hyperlinked environment 
Jon M. Kleinberg 

January 1998 Proceedings of the ninth annual ACM-SIAM symposium on Discrete 
algorithms 

Full text available: ^j) pdf( 843.14 KB) Additional Information: full citation , references , citing s, index terms 



18 Learners as authors: helping ESL employees in a Canadian bank prepare customer 
relations and documen ta t i on m a terial 
Paul Beam, Diane Burke 

October 1994 Proceedings of the 12th annual international conference on Systems 
documentation: technical communications at the great divide 

Full text available: pdf(941.19 KB ) Additional Information: full cita tion , references , index terms 
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